Nature Biotechnology
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match Nature Biotechnology's content profile, based on 147 papers previously published here. The average preprint has a 0.34% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Raabe, N. J.; Griffith, M. P.; Rangachar Srinivasa, V.; Waggle, K. D.; Sundermann, A. J.; Pless, L.; Snyder, G. M.; Brooks, M. M.; Van Tyne, D.; Harrison, L. H.
Show abstract
2.Plasmids are extrachromosomal mobile genetic elements that often carry genes responsible for antimicrobial resistance. Plasmid epidemiology aims to track the evolution and spread of plasmids, but the field currently faces significant barriers that make practical implementation using whole genome sequence data difficult. Hybrid-assembled genomes remain the most reliable way to identify and track complete plasmids; however, most genomic surveillance data exists in the form of short-read sequencing, which lacks the resolution required to accurately resolve plasmids. Despite recent advances, long-read-only assemblies have not yet reached the consistency seen in hybrid assemblies. The ideal approach to plasmid epidemiology using whole genome sequence data would consider the limitations of sequencing technologies and the constraints of existing genomic surveillance infrastructure, in addition to the unique evolutionary biology of plasmids. Here, we present ACCIO (Assembly-based Circular Contig Identification for Outbreaks), a tool which creates a reference plasmid database and uses it to infer which plasmids, and genetically related plasmid groupings, are present in an input assembly (Illumina, Nanopore, or hybrid assembly). We validated ACCIO using an internal dataset of 303 plasmid-harboring bacterial clinical and surveillance isolates collected from a single acute tertiary care center. When highly related database plasmids were grouped together, ACCIO achieved 100% sensitivity and 92.1% positive predictive value (PPV) for detection of plasmid groups using hybrid assemblies, and comparably strong performance for Illumina (93.0% sensitivity, 86.6% PPV) and Nanopore (79.3% sensitivity, 91.4% PPV) assemblies. Evaluation on three external datasets yielded consistently high performance. Finally, when benchmarked against MOB-suite, a tool for reconstruction and typing of plasmids, ACCIO demonstrated superior performance across nearly all assembly types and plasmid grouping levels. By integrating database construction, clustering, and plasmid calling into a single workflow compatible with all major sequencing platforms, ACCIO is intended to help advance plasmid epidemiology beyond its current technological and infrastructural barriers. 3. Impact statementDetecting and tracking plasmids--the mobile genetic elements often responsible for spreading antimicrobial resistance in hospital settings--is challenging, particularly when relying on short-read sequencing data alone. Short-read genome assemblies, despite widespread use in surveillance of bacterial pathogens, inherently lack the resolution required for plasmid analyses. Current bioinformatic methods struggle to identify whole plasmids from short-read assemblies alone, and often, hybrid assembly using both short- and long-read data is required for the robust analyses that are essential for tracking plasmids. To address these challenges, we developed ACCIO, a bioinformatics tool which utilizes input genome assemblies (short-read, long-read, or hybrid assemblies) to assess the plasmid content of clinical bacterial isolates for epidemiologic purposes. We validated its use against the recovery of circular plasmid sequences from hybrid assembled genomes as a gold standard method for determining plasmid content. Using a curated local database of 430 plasmid sequences, ACCIO provided accurate inferences of plasmid content from short-read (Illumina), long-read (Oxford Nanopore Technologies), and hybrid assemblies (both), ultimately facilitating genomic surveillance of plasmids regardless of sequencing technology. This work represents a meaningful step forward in advancing plasmid surveillance beyond the technological and infrastructural barriers that limit its broader expansion into healthcare and other settings. 4. Data summaryShort- and long-read sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under multiple BioProjects, and corresponding hybrid genome assemblies are available in GenBank. Accession numbers for all BioProjects, BioSamples, and SRA datasets are provided in Supplementary Data S1. All supporting data, software code, and experimental/analysis protocols are provided within the article or in supplementary data files. External validation of ACCIO used three external datasets (Cho et al. 2023, BioProjects PRJNA475751 and PRJNA874473, DOI: 10.1038/s41598-024-70540-1; Lipworth et al. 2024, BioProject: PRJNA604975, DOI: 10.1038/s41467-024-45761-7; Khezri et al. 2021, European Nucleotide Archive (ENA): PRJEB45084, DOI: 10.3390/microorganisms9122560). List of External SoftwareO_LIMOB-suite (v3.1.9) - https://github.com/phac-nml/mob-suite C_LIO_LISkani (v0.2.2) - https://github.com/bluenote-1577/skani C_LIO_LIScipy (v1.16.1) - https://github.com/scipy/scipy C_LIO_LIPling (v2.0.0) - https://github.com/iqbal-lab-org/pling C_LIO_LIMUMmer / NUCmer (v4.0.1) - https://mummer4.github.io/ C_LIO_LIMash / Mash Screen (v2.3) - https://github.com/marbl/Mash C_LIO_LISPAdes (v3.15.5) - https://github.com/ablab/spades C_LIO_LIUnicycler (v0.5.1) - https://github.com/rrwick/Unicycler C_LIO_LIFlye (v2.9.5) - https://github.com/mikolmogorov/Flye C_LIO_LIQUAST (v5.2.0) - https://github.com/ablab/quast C_LIO_LIKraken2 (v2.1.3) - https://github.com/DerrickWood/kraken2 C_LIO_LICheckM (v0.4) - https://github.com/Ecogenomics/CheckM C_LIO_LIAlbacore/Guppy - [no longer officially hosted; was distributed by ONT] C_LIO_LIGuppy - https://nanoporetech.com/software/other/guppy C_LIO_LIDorado - https://github.com/nanoporetech/dorado C_LIO_LIBowtie2 (v2.5.4) - https://github.com/BenLangmead/bowtie2 C_LIO_LIMinimap2 (v2.28) - https://github.com/lh3/minimap2 C_LIO_LIBiopython (v1.85) - https://biopython.org/ C_LIO_LIPandas (v2.3.1) - https://pandas.pydata.org/ C_LIO_LIPlasme (v1.1) - https://github.com/HubertTang/PLASMe C_LIO_LIBLAST(v2.17.0) - https://blast.ncbi.nlm.nih.gov/Blast.cgi C_LI
van Eijndhoven, M. A. J.; Aparicio-Puerta, E.; Gomez-Martin, C.; Medina, J. M.; Drees, E. E. E.; Bradley, E. J.; Bosch, L.; Scheepbouwer, C.; Hackenberg, M.; Pegtel, D. M.
Show abstract
Terminal nucleotidyl transferases are enzymes that add non-templated nucleotides to RNA molecules. In the case of microRNAs, this process was shown to be functionally relevant for their maturation process and generation of isomiRs with non-canonical mRNA targets. Deconvolution of these posttranscriptional modifications is challenging in particular for extracellular miRNAs that are considered as a target for minimally-invasive diagnostics. Massively parallel RNA sequencing is the only method that can truthfully reveal isomiR diversity in biological samples and determine relative quantities. Improvements aside, current small RNA sequencing strategies remain imprecise. We developed IsoSeek that diverges from these methods by making use of randomized 5- and 3-adapters combined with a 10N unique molecular identifier (UMI). Using synthetic miRNA and isomiR spike-in sets and testing depletion and RNA competition strategies in 7 sequencing rounds of >100 samples, we rigorously optimized and validated the technical accuracy of the IsoSeek method. In genetically-altered HEK293, we characterized the terminal uridylase (TUT4/TUT7) dependent miRNA uridylome and discovered extensive uridylation of disease-associated miRNAs. Notably, 3-uridylated isomiR profiles of plasma extracellular vesicles (EVs) rely on UMI-correction. Thus, IsoSeek advances our knowledge of cell-free miRNAs and supports development into non-invasive biomarkers.
Ntekas, I.; Takayasu, L.; McKellar, D. W.; Grodner, B. M.; Holdener, C.; Schweitzer, P. A.; Sauthoff, M.; Shi, Q.; Brito, I. L.; De Vlaminck, I.
Show abstract
Inter-microbial and host-microbial interactions are thought to be critical for the functioning of the gut microbiome, but few tools are available to measure these interactions. Here, we report a method for unbiased spatial sampling of microbiome-host interactions in the gut at one micron resolution. This method combines enzymatic in situ polyadenylation of both bacterial and host RNA with spatial RNA-sequencing. Application of this method in a mouse model of intestinal neoplasia revealed the biogeography of the mouse gut microbiome as function of location in the intestine, frequent strong inter-microbial interactions at short length scales, shaping of local microbiome niches by the host, and tumor-associated changes in the architecture of the host-microbiome interface. This method is compatible with broadly available commercial platforms for spatial RNA-sequencing, and can therefore be readily adopted to broadly study the role of short-range, bidirectional host-microbe interactions in microbiome health and disease.
Mazelis, I.; Sun, H.; Kulkarni, A.; Torre, T.; Klein, A. M.
Show abstract
Single-cell sequencing methods uncover natural and induced variation between cells. Many functional genomic methods, however, require multiple steps that cannot yet be scaled to high throughput, including assays on living cells. Here we develop capsules with amphiphilic gel envelopes (CAGEs), which selectively retain cells and large analytes while being freely accessible to media, enzymes and reagents. Capsules enable high-throughput multi-step assays combining live-cell culture with genome-wide readouts. We establish methods for barcoding CAGE DNA libraries, and apply them to measure persistence of gene expression programs in cells by capturing the transcriptomes of tens of thousands of expanding clones in CAGEs. The compatibility of CAGEs with diverse enzymatic reactions will facilitate the expansion of the current repertoire of single-cell, high-throughput measurements and extension to live-cell assays.
Liao, H.; Kottapalli, S.; Huang, Y.; Chaw, M.; Gehring, J.; Waltner, O.; Phung-Rojas, M.; Daza, R. M.; Matsen, F. A.; Trapnell, C.; Shendure, J.; Srivatsan, S. R.
Show abstract
Spatial genomics technologies include imaging- and sequencing-based methods. Sequencing-based spatial methods typically require surfaces coated with coordinate-associated DNA barcodes, but the physical registration of these barcodes to spatial coordinates is challenging, necessitating either high density printing of oligonucleotides or in situ sequencing/probing of randomly deposited, DNA-barcode-bearing beads. As a consequence, the surface areas available to sequencing-based spatial genomic methods are constrained by the time, labor, cost and instrumentation required to either print or decode a coordinate-tagged surface. To address this challenge, we developed SCOPE (Spatial reConstruction via Oligonucleotide Proximity Encoding), an optics-free, DNA microscopy-inspired method. With SCOPE, the relative positions of DNA-barcoded beads within a 2D shape, 2D image or 3D volume are inferred from the ex situ sequencing of chimeric molecules formed from diffusing "sender" and tethered "receiver" oligonucleotides. To demonstrate the potential of this approach, we applied SCOPE to reconstruct 2D shapes, 2D images or 3D volumes defined by 104-106 x 20-100 {micro}m DNA barcoded beads, including an asymmetric "swoosh" resembling the Nike logo (44 mm2), a "color" Snellen eye chart (704 mm2) and the surface topology of 3D molds of a teddy bear, star, butterfly or block letter (75-100 mm3). Each of the resulting "DNA barcode proximity graphs" was computationally reconstructed in an automated fashion, across fields of view and at resolutions that were determined by sequencing depth, bead size and diffusion kinetics, rather than by microarray or microscope instrument time. Because the ground truth shapes are known, these datasets may be particularly useful for the further development of computational algorithms by this nascent field.
Mullaney, D. B.; Sgrizzi, S. R.; Mai, D.; Campbell, I.; Huang, Y.; Sinkunas, A.; Kerr, D. L.; Browning, V. E.; Eisenach, H. E.; Sims, J. N.; Nichols, E. K.; Lapointe, C. P.; Amimura, Y.; Harris, K.; Zilionis, R.; Srivatsan, S. R.
Show abstract
Single-cell genomics methods have unveiled the heterogeneity present in seemingly homogenous populations of cells, however, these techniques require meticulous optimization. How exactly does one handle and manipulate the biological contents from a single cell? Here, we introduce and characterize a novel semi-permeable capsule (SPC), capable of isolating single cells and their contents while facilitating biomolecular exchange based on size-selectivity. These capsules maintain stability under diverse physical and chemical conditions and allow selective diffusion of biomolecules, effectively retaining larger biomolecules including genomic DNA, and cellular complexes, while permitting the exchange of smaller molecules, including primers and enzymes. We demonstrate the utility of SPCs for single cell assays by performing the simultaneous culture of over 500,000 cellular colonies, demonstrating efficient and unbiased nucleic acid amplification, and performing combinatorial indexing-based single-cell whole genome sequencing (sc-WGS). Notably, SPC-based sc-WGS facilitates uniform genome coverage and minimal cross-contamination allowing for the detection of genomic variants with high sensitivity and specificity. Leveraging these properties, we conducted a proof-of-concept lineage tracing experiment using cells harboring the hypermutator polymerase {varepsilon} allele (POLE P286R). Sequencing of 1000 single cell genomes at low depth facilitated the capture of lineage marks deposited throughout the genome during each cell division and the subsequent reconstruction of cellular genealogies. Capsule-based sc-WGS expands the single-cell genomics toolkit and will facilitate the investigation of somatic variants, resolved to single cells at scale.
Bachelet, I.
Show abstract
The peer-reviewed journal article imposes structural constraints on the dissemination, validation, and reuse of research outputs. Intermediate results, negative findings, methodological refinements, and replication attempts are systematically underrepresented in published literature, limiting visibility into ongoing research activity for both scientists and mission-driven funders. Here we present Carrierwave, an open infrastructure for continuous, granular scientific communication built on structured research objects (ROs), cryptographic provenance, blockchain-based attribution, and programmable incentive mechanisms. Each RO represents an atomic unit of scientific output -- a single experimental result, negative finding, dataset, protocol, or replication -- that is hashed for content integrity, stored in a persistent database, and optionally minted as an ERC-721 non-fungible token on the Ethereum blockchain. The system includes an on-chain bounty pool enabling funders to directly incentivize specific research activities, and an automated analysis layer that synthesizes disclosed ROs into continuously updated research landscape maps. We describe the system architecture, report on its implementation and deployment on Ethereum mainnet, and present a quantitative analysis of disease-specific publication frequency demonstrating the information latency problem that Carrierwave addresses. The distribution of publication frequency across disease areas is highly skewed, with the majority of conditions represented by fewer than four publications per year in high-impact biology journals. For diseases in the long tail, the interval between successive publications may span months or years. Publication frequency correlates poorly with disease burden, instead reflecting historical research community size and advocacy momentum. By reducing the unit of communication to the individual research object and eliminating editorial gatekeeping as a prerequisite for disclosure, Carrierwave increases the effective sampling rate of scientific activity in precisely the domains where publication-based visibility is most sparse. The system is live at https://carrierwave.org.
Flynn, R. A.; Ge, R.; Rai, S. K.; Coffey, R. J. A.; Jeppesen, D. K.; Zhang, Q.; Higginbotham, J. N.
Show abstract
Glycosylated RNAs (glycoRNAs) represent a recently discovered class of small RNAs, but their systematic characterization has been limited by reliance on metabolic chemical reporters and high RNA input requirements. Here we present rPAL sequencing (rPAL-seq), a sensitive and selective platform for de novo discovery of sialoglycoRNAs. rPAL-seq combines enhanced periodate oxidation of sialic acids with a capture-release workflow and optimized library construction using poly(A) extension coupled with template-switching reverse transcription. The method enabled reproducible profiling from less than 100 ng of input RNA, corresponding to less than 2% of the material required by previous approaches. When applied across 13 human cell lines, rPAL-seq identified lineage-associated glycoRNA patterns alongside a conserved core dominated by uridine-rich snRNAs and snoRNAs, with modification signatures implicating glycosylation on acp3U or related uridine-based modifications. Extending to extracellular vesicles and non-vesicular nanoparticles, rPAL-seq revealed secreted glycoRNA profiles distinct from those of the cellular fraction. rPAL-seq provides a robust, scalable strategy for glycoRNA profiling, opening new avenues for studying this emerging biopolymer.
Kraft, L.; Soeding, J.; Steinegger, M.; Jochheim, A.; Fernandez-Guerra, A.; Renaud, G.
Show abstract
De novo assembly of ancient metagenomic datasets is a challenging task. Ultra-short fragment size and characteristic postmortem damage patterns of sequenced ancient DNA molecules leave current tools ill-equipped for ideal assembly. We present CarpeDeam, a novel damage-aware de novo assembler designed specifically for ancient metagenomic samples. Utilizing maximum-likelihood frameworks that integrate sample-specific damage patterns, CarpeDeam demonstrates improved recovery of longer continuous sequences and protein sequences in many simulated and empirical datasets compared to existing assemblers. As a pioneering ancient metagenome assembler, CarpeDeam opens the door for new opportunities in functional and taxonomic analyses of ancient microbial communities.
Marafini, P.; Smith, D. G.; Lamstaes, A. R.; Contreras, R. E.; Williams, I.; West, I.; Ambridge, O.; Sanders-Brown, V.; Intaite, E.; Hii, C. Y.; Hume, B. C.; Munagala, U.; Plumbly, W.; Brown, F. L.; Shlyakhtina, Y.; Woods, L.; Bibby, J. A.; Williams, L.; Yang, J. H.; Steffy, B.; Zawada, L.; Harger, J. W.; McKenzie, D.; Laing, A. G.; Stubbington, M. J.; Edelman, L. B.
Show abstract
Existing tools for single cell genomics require complex physical frameworks for the indexing of cellular nucleic acids, including proprietary instrumentation, droplet emulsions, and laborious combinatorial indexing schemes. The complexity and cost of these tools significantly constrains the use of single cell technologies across basic and translational research. Here, we describe an instrument-free method that uses novel, bifunctional indexing reagents to deliver index sequences directly to single cells followed by a biophysical process known as Kinetic Confinement to perform high-fidelity indexing of target molecules across thousands of single cells simultaneously in single-tube, solution-phase reactions. Kinetic Confinement enables simple, fast, and flexible single cell experiments, and allows straightforward scaling to very large sample numbers. We anticipate that assays based on Kinetic Confinement will significantly expand the scope, use, and impact of single cell analysis across fundamental and applied research, as well as within therapeutic development and ultimately applied clinical diagnostics.
Chen, H.-M.; Kao, J.-C.; Yang, C.-P.; Tan, C.; Lee, T.; Sugino, K.
Show abstract
The Smart-seq family of methods represents the gold standard for high-sensitivity, full-length single-cell RNA sequencing. Despite iterative improvements, fundamental challenges remain: the generation of non-specific PCR products that limit sensitivity, the inability to capture precise Transcription End Sites (TES), and the insidious generation of "phantom UMIs"--artificial molecular barcodes created during PCR that systematically inflate molecular counts. Here, we present ESPeR-seq, a novel architecture that resolves these barriers. To enable precise, stranded TES capture, we developed an "Omega-dT" primer that bypasses synthetic poly-T tracts, restoring high-quality sequencing directly at transcript termini. To eliminate both PCR background and phantom UMIs, we implemented a biochemical "multi-lock" mechanism utilizing uracil-containing TSOs and a uracil-intolerant DNA polymerase. We validate this approach using the logQ-slope, a novel metric that sensitively diagnoses UMI fidelity. Benchmarking reveals that while state-of-the-art methods still exhibit signs of UMI inflation, ESPeR-seq strictly prevents it. Furthermore, the strandedness and precise end-delineation provided by TSO and dT reads support robust de novo gene model reconstruction, enabling the discovery of novel multi-exon genes, unannotated 3 UTR extensions, and candidate eRNAs across aggregated single-cell populations. Thus, ESPeR-seq establishes a robust framework for absolute quantitative accuracy and full-length isoform resolution.
Henderson, G.; Gudys, A.; Baharav, T.; Sundaramurthy, P.; Kokot, M.; Wang, P. L.; Deorowicz, S.; Carey, A.; Salzman, J.
Show abstract
Bacteria comprise > 12% of Earths biomass and profoundly impact human and planetary health.1 Many key biological functions of microbes, and functions differentiating strains, are conferred or modified by genome plasticity including mobilization of genetic elements, phage integration, and CRISPR arrays. Characterizing each of these processes is time-consuming and requires custom bioinformatic workflows ill-suited to enable discovery of new sources of genetic diversity or to uncover which elements are active. Further, strain typing of bacterial species and approaches to discriminate sub-populations remain time-consuming and resource intensive. Here, we show that SPLASH, our published approach for reference-free discovery and analysis directly from raw reads, and an improved statistical assembly algorithm, compactors, unify diverse tasks in microbial sequence analysis: discovering new mobile elements and CRISPR arrays missing from any reference, and generating rapid, metadata-free strain typing of diverse bacteria. SPLASH and compactors together constitute a new general discovery tool for biological discovery in the microbial world.
Langley, J.; Baudrier, L.; Curry, J.; Narta, K.; Todesco, H. M.; Potts, K.; Morrissy, S.; Mahoney, D. J.; Billon, P.
Show abstract
Engineered virus-like particles (eVLPs) enable transgene-free ribonucleoprotein delivery for genome editing applications, yet optimized delivery strategies for high-throughput applications remain unexplored. Prime editing enables precise genomic modifications but suffers from limited efficiency that constrains its widespread adoption. Here, we present PRIME-VLP (Progressive Repeated Infections for Maximized Editing via Virus-Like Particles), a delivery strategy that enhances prime editing efficiency for both targeted genome engineering and high-throughput prime editing screening. PRIME-VLP leverages the temporal dynamics of eVLP-mediated editing through multiple sequential transductions with sub-saturating eVLP doses delivered at optimal intervals. This approach achieves 1.5 to 2.8-fold improvements in editing efficiency across diverse genomic targets and cell types. PRIME-VLP maintains high specificity without increasing off-target effects, compromising cellular viability or causing transcriptional perturbations. By decoupling pegRNA and editor delivery through pegRNA-free eVLPs, PRIME-VLP enables pooled prime editing screens, circumventing transgene silencing limitations of conventional lentiviral-based screens. Using a 6,000-pegRNA library targeting TP53, PRIME-VLP achieved 2.8-fold higher editing efficiency and improved reproducibility compared to conventional lentiviral delivery. An eVLP-based screen identified functional TP53 loss-of-function variants that confer resistance to MDM2 inhibition by Nutlin-3. This work expands the versatility of eVLPs beyond their current in vivo therapeutic applications, demonstrating their promise for high-throughput functional genomics by overcoming the delivery limitations of lentiviral systems.
Walton, R. T.; Qin, Y.; Blainey, P. C.
Show abstract
Forward genetic screens seek to dissect complex biological systems by systematically perturbing genetic elements and observing the resulting phenotypes. While standard screening methodologies introduce individual perturbations, multiplexing perturbations improves the performance of single-target screens and enables combinatorial screens for the study of genetic interactions. Current tools for multiplexing perturbations are limited by technical challenges and do not offer compatibility across diverse screening methodologies, including enrichment, single-cell sequencing, and optical pooled screens. Here, we report the development of CROPseq-multi (CSM), a CROPseq1-inspired lentiviral system to multiplex Streptococcus pyogenes (Sp) Cas9-based perturbations with versatile readout compatibility and high performance for both perturbation and barcode identification. CSM has equivalent per-guide activity to CROPseq and low lentiviral recombination frequencies. Dual-guide CSM libraries are constructed in a single, facile molecular cloning step that facilitates the use of unique molecular identifiers. CSM is compatible with enrichment screening methodologies, single-cell RNA-sequencing readouts, and optical pooled screens. For optical pooled screens, an optimized and multiplexed in situ detection protocol improves barcode counts 10-fold (for mRNA detection), enables detection of recombination events, and reduces the number of sequencing cycles required for decoding by 3-fold relative to CROPseq. CROPseq-multi-v2 (CSMv2) adds compatibility for detection methods based on T7 RNA polymerase in vitro transcription2-5. CSM provides a single system for CRISPR screens that is compatible with individual and combinatorial perturbations, diverse SpCas9-based perturbation technologies, and multiple high-content, single-cell phenotypic readouts.
Ghaddar, B.; Blaser, M. J.; De, S.
Show abstract
We developed SAHMI, a computational resource to identify truly present microbial nucleic acids and filter contaminants and spurious false-positive taxonomic assignments from standard transcriptomic sequencing of mammalian tissues. In benchmark studies, SAHMI correctly identifies known microbial infections present in diverse tissues. The application of SAHMI to single-cell and spatial genomic data enables co-detection of somatic cells and microorganisms and joint analysis of host-microbiome ecosystems.
Ziemski, M.; Gehret, L.; Simard, A.; Castro Dau, S.; Risch, V.; Grabocka, D.; Matzoros, C.; Wood, C.; Momo Cabrera, P.; Hernandez-Velazquez, R.; Herman, C.; Evans, K.; Robeson, M. S.; Bolyen, E.; Caporaso, J. G.; Bokulich, N. A.
Show abstract
Metagenome sequencing has revolutionized functional microbiome analysis across diverse ecosystems, but is fraught with technical hurdles. We introduce MOSHPIT (https://moshpit.readthedocs.io), software built on the QIIME 2 framework (Q2F) that integrates best-in-class CAMI2-validated metagenome tools with robust provenance tracking and multiple user interfaces, enabling streamlined, reproducible metagenome analysis for all expertise levels. By building on Q2F, MOSHPIT enhances scalability, interoperability, and reproducibility in complex workflows, democratizing and accelerating discovery at the frontiers of metagenomics.
Li, M.; Zhai, X.; Li, J.; Li, S.; Du, Y.; Zhang, J.; Zhang, R.; Luo, Y.; Wei, W.; Liu, Y.
Show abstract
Microbial communities are extraordinarily diverse and play crucial roles in health and disease, yet current methods lack the resolution and scalability needed to dissect their genomic and ecological complexity at the single-cell level. Here, we present CAP-seq, a high-throughput single-microbe genomics platform that combines hydrogel-based semi-permeable encapsulation with minimal microfluidics to recover thousands of single-amplified genomes (SAGs) with long reads and high completeness at low sequencing depth. We benchmarked CAP-seq using defined microbial communities, demonstrating strain-level resolution, accurate detection of rare taxa, and genome recovery exceeding 50% at [~]10x coverage. Applying CAP-seq to pediatric Clostridioides difficile infection microbiomes, we generated a high-resolution single-cell atlas comprising tens of thousands of SAGs across hundreds of species. Host-resolved profiling of the cryptic plasmid pBI143 revealed previously hidden low-abundance host associations, six new plasmid versions, and their coexistence within individuals, indicating complex plasmid evolution in situ. Longitudinal analysis during fecal microbiota transplantation and vancomycin treatment uncovered dynamic remodeling of microbial hosts, antimicrobial resistance genes, and plasmids at single-cell resolution. CAP-seq enables scalable, high-performance single-cell genomics and provides a practical, widely accessible platform for microbiome analysis, paving the way for large-scale exploration of microbial dark matter and host-microbe interactions across diverse ecosystems.
Battistoni, G.; Garcia, S. T.; Sia, C. Y.; Corriero, S.; Boquetale, C.; Williams, E. G.; Alini, M.; Hemmer, N.; IMAXT Cancer Grand Challenge Consortium, ; Balasubramanian, S.; Nicholson, B. C.; Hannon, G. J.; Bressan, D.
Show abstract
Mapping the molecular identities and functions of cells within their spatial context is key to understanding the complex interplay within and between tissue neighbourhoods. A wide range of methods have recently enabled spatial profiling of cellular anatomical contexts, some offering single-cell resolution. These use different barcoding schemes to encode either the location or the identity of target molecules. However, all these technologies face a trade-off between spatial resolution, depth of profiling, and scalability. Here, we present Barcoding by Activated Linkage of Indexes (BALI), a method that uses light to write combinatorial spatial molecular barcodes directly onto target molecules in situ, enabling multi-omic profiling by next generation sequencing. A unique feature of BALI is that the user can define the number, size, and shape, and resolution of the spatial locations to be interrogated, with the potential to profile millions of distinct regions with subcellular precision. As a proof of concept, we used BALI to capture the transcriptome, chromatin accessibility, or both simultaneously, from distinct areas of the mouse brain in single tissue sections, demonstrating strong concordance with publicly available datasets. BALI therefore combines high spatial resolution, high throughput, histological compatibility, and workflow accessibility to enable powerful spatial multi-omic profiling.
Luo, G.; Zang, Z.; Yuan, L.; Zhou, J.; Dong, A.; Huang, Y.; Li, S. Z.; Ju, F.
Show abstract
The discovery of RNA viruses from metatranscriptomic data remains challenging due to their extreme sequence divergence and frequent lack of conserved motifs. We present Rider, a lightweight two-stage framework that couples fast, structure-informed sequence screening with targeted structural validation. Stage 1 uses a compact 35M-parameter protein language model to prioritize RdRp-like fragments at whole-sample scale, achieving over 44x higher end-to-end screening throughput on commodity hardware. Stage 2 applies structure prediction and Foldseek-based alignment against a dedicated RdRp structure resource ([~]200k ESMFold-predicted structures), providing orthogonal evidence for remote homologs. Applied to >10,000 metatranscriptomes spanning marine, freshwater, soil and host-associated microbiomes, Rider matches or outperforms leading tools (e.g., LucaProt, PalmScan) and additionally recovers divergent and truncated sequences. Multiple orthogonal indicators, including structure consistency and low DNA read mapping to corresponding contigs, support genuine RNA origin. In a human IBD cohort, Rider agrees with state-of-the-art calls for clinically relevant RNA viruses while extending discovery to divergent lineages. Rider turns structure-guided homology search into a practical, scalable pipeline for RNA virome discovery. HighlightO_LIA two-stage framework enables structure-guided RNA virus discovery at sample scale, achieving up to 44-fold higher throughput on standard computing hardware. C_LIO_LIThe method matches or surpasses LucaProt and PalmScan across >10,000 metatranscriptomes from diverse environments, while recovering RdRp fragments missed by existing tools. C_LIO_LIStructural validation using [~]200,000 ESMFold-predicted RdRp models and Foldseek alignment supports the detection of remote homologs with high confidence. C_LIO_LIOrthogonal evidence, including low DNA read mapping, strand-specific expression, and ORF metrics, confirms RNA origin and reduces false positives.. C_LIO_LIOpen-source code and an openly released RdRp structure database enable scalable, reproducible RNA virome discovery in environmental and clinical settings. C_LI
Baysoy, A.; Tian, X.; Zhang, F.; Renauer, P.; Bai, Z.; Shi, H.; Li, H.; Tao, B.; Yang, M.; Enninful, A.; Gao, F.; Wang, G.; Zhang, W.; Tran, T.; Patterson, N. H.; Bao, S.; Dong, C.; Xin, S.; Zhong, M.; Rankin, S.; Guy, C.; Wang, Y.; Connelly, J. P.; Pruett-Miller, S. M.; Chi, H.; Chen, S.; Fan, R.
Show abstract
Perturb-seq enabled the profiling of transcriptional effects of genetic perturbations in single cells but lacks the ability to examine the impact on tissue environments. We present Perturb-DBiT for simultaneous co- sequencing of spatial transcriptome and guide RNAs (gRNAs) on the same tissue section for in vivo CRISPR screen with genome-scale gRNA libraries, offering a comprehensive understanding of how genetic modifications affect cellular behavior and tissue architecture. This platform supports a variety of delivery vectors, gRNA library sizes, and tissue preparations, along with two distinct gRNA capture methods, making it adaptable to a wide range of experimental setups. In applying Perturb-DBiT, we conducted un-biased knockouts of tens of genes or at genome-wide scale across three cancer models. We mapped all gRNAs in individual colonies and corresponding transcriptomes in a human cancer metastatic colonization model, revealing clonal dynamics and cooperation. We also examined the effect of genetic perturbation on the tumor immune microenvironment in an immune-competent syngeneic model, uncovering differential and synergistic perturbations in promoting immune infiltration or suppression in tumors. Perturb-DBiT allows for simultaneously evaluating the impact of each knockout on tumor initiation, development, metastasis, histopathology, and immune landscape. Ultimately, it not only broadens the scope of genetic inquiry, but also lays the groundwork for developing targeted therapeutic strategies.